Implementing a fault-tolerant service using state machine replication requires us to be able to view our service in terms of state machines. Let's understand the characteristics of a state machine.

Core components of a state machine#

state machine consists of two main components: state variables and commands.

State variables#

State variables of a state machine specify its state. These variables are any instances of data structures that provide details about the configuration of a server or information stored by them. They are like normal variables because they contain values.

Commands#

Commands of a state machine are programs, functions, or code that alter the state of the state machine. These commands can perform the following operations:

  1. Update values of state variables

  2. Give an output

A vital characteristic of these commands is that they are deterministic. We can assume that the program or code that implements these commands is deterministic. Another important characteristic of these commands is that their execution is atomic with other commands. These assumptions are necessary so that we can compare the result or output of the state machines with each other.

Point to ponder

Question

Since commands must be deterministic, doesn’t that exclude many use cases that require randomness?

Hide Answer

Yes, use cases where randomness is required are excluded from state machines. For example, the generation of a random number is a non-deterministic computation. If such a task is done within the command of a process, it does not qualify as a state machine.

However, we can redesign our system so that non-deterministic computation occurs outside the state machine. We will demonstrate this later in the Example 3: Process control section of this lesson, where the operation of reading from a sensor (which is non-deterministic) takes place outside the state machine. In such cases, the non-deterministic operation’s output can be the state machine’s input.

Note: We will deliberately refrain from discussing how a command will be implemented or invoked to ensure that our model gives a general understanding. We will only talk about the steps involved in a command using pseudocode or plain English. We will describe state machines by only listing their state variables, commands, and the steps in those commands.

Complementary components#

Now that we know what makes up a state machine, we'll discuss components of the state machine approach that will help us understand the interaction to and from a state machine.

Clients and requests#

State machines have clients. A state machine's client can request to execute its commands. The client specifies the state machine, the command's name, and the information needed by the command in a request.

Output#

A request produces an output when the command specified in the request produces an output. The output of a request (or command) can be to storage or memory hardware, a controlling device that controls hardware, or other clients. We can also refer to this output as the output of the state machine that processes the request.

This diagram enumerates all components required to model a system as a state machine
This diagram enumerates all components required to model a system as a state machine

In the illustration above, the state machine has two state variables:

  • arr: An array that stores integers.

  • L: Represents the length of arr.

It has three commands:

  • arr(val): Places val at the smallest available index in arr that is not in use. It returns -1 if no index is available.

  • update(val, ind): Updates the value at index ind in arr to val. It returns 1 when done.

  • read(val): Returns the value stored at index ind of arr.

The diagram shows that its clients send requests in the order insert(3), update(5, 2), and read(2). The “Output(s)” box shows outputs that the state machine returns after processing these requests. We can see that the first output is -1, so the insert(3) command could not find an index to place 3. The next output is 1, which indicates that update(5, 2) is executed successfully. And lastly, the output 5 is for the read(2) command.

We will briefly discuss how state machine process requests in the next lesson. In the following lessons, we will discuss this in more detail.

Request processing by state machines#

State machines process one request at a time. The order in which state machines process requests is consistent with potential causality. The two important assumptions that a state machine’s clients can make about the order in which it processes requests are as follows:

  1. [O1O1] A state machine processes requests by one of its clients in the order the client issued the requests. For example, a client cc issues requests to a state machine smsm in the following order: request AA, then request BB, and then request CC. In this case, smsm will process AA first, then BB, and then CC.

  2. [O2O2] If a client c0c_0​ makes a request r0r_0​ to state machine smsm that causes another client c1c_1​ to make a request r1r_1​ to smsm, then smsm will process r0r_0​ before r1r_1​.

These two assumptions do not necessarily mean that smsm will process requests in the order they were made or received. Here is an example:

Created with Fabric.js 3.6.6
In this example, two clients c0 and c1, issue requests in the order specified in the respective request boxes (left to right). Also, the request r1C by c1 causes request r0B by c0.

1 of 8

Created with Fabric.js 3.6.6
Due to network delays, our state machine receives requests in the order specified in the request feed box (left to right).

2 of 8

Created with Fabric.js 3.6.6
The first request our state machine receives is r1B. It does not process r1B immediately since c1 made request r1A before request r1B.

3 of 8

Created with Fabric.js 3.6.6
On receiving r1A, it processes both r1A and r1B since they were made by c1 in that order.

4 of 8

Created with Fabric.js 3.6.6
The state machine then receives the request r0B and does not process it since an unprocessed request r1C caused it. Also, c0 made r0A before it made request r0B. r1C and r0A will both be processed before r0B.

5 of 8

Created with Fabric.js 3.6.6
The state machine processes r1C immediately.

6 of 8

Created with Fabric.js 3.6.6
The state machine processes r0A immediately. Since both r1C and r0A have been processed, it also processes r0B.

7 of 8

Created with Fabric.js 3.6.6
The last remaining request in the feed, r0C, is received and processed immediately.

8 of 8

In the example above, the request r1Cr_{1C} causes the request r0Br_{0B}, so the state machine processes r1Cr_{1C} before it processes r0Br_{0B}. But it is also important to note that r0Br_{0B} will only be processed once the state machine processes r0Ar_{0A} since the client c0c_0 made the request r0Ar_{0A} before r0Br_{0B}.

Point to ponder

Question

In the context of the above example, it is possible that two requests that arrive at a state machine might be concurrent (meaning there is no causality link between them). That might mean that the state machine can run them in any order. But doing so might break the consistency of replicas because one replica might order concurrent requests one way while the other does it differently. What are your thoughts?

Hide Answer

We cannot allow different orders at different replicas. We need to totally order all the requests (including any concurrent requests with no causality link between them).

One possible solution might be that each replica orders concurrent requests by sending a client ID.

In practice, a leader gives a specific order to the requests and then forwards them to all the replicas. Later in this chapter, we’ll see how having a single leader and funneling all requests through it is a practical and common way to achieve atomic broadcast.

Semantic characterization of a state machine#

The defining characteristic of a state machine is that it gives a deterministic computation that reads and processes a stream of requests, and processing a request might or might not produce an output. In other words, given an initial state, only the sequence of requests processed by a state machine completely determines its output.

Point to ponder

Question

Since state machines process one request at a time, does that lead to performance limitations?

Hide Answer

Yes, processing one request at a time can lead to performance limitations since we need to wait for a request to be processed before the next one so that the state machine can transition safely from one consistent state to the next.

With processing one request at a time, our state machine replicas process requests in the same order and end up in the same state while producing the same output. The state of all replicas remains consistent. With multiple requests being processed simultaneously, achieving this would take much work.

The potential loss of performance is the price we pay to get the fault tolerance via consistent state machine replicas. However if the performance is not acceptable for some applications, we might need to partition the state. In this case, each shard is managed by a different set of state machine replica groups. However, particular care must be taken when partitioning so that interaction between shards is either non-existent or minimal to achieve better performance.

Some examples of state machines#

Now that we have learned about state machines, let's look at some examples:

Example 1: A memory state machine#

Here is our first state machine example. Notice how we have described it in terms of its state variables and commands.

A memory state machine

Let's describe the components of the memory state machine. It has one state variable:

  • store: An array of size n.

It has two commands:

  • read(loc): Returns (to the state machine's client) the value stored in store at index loc. The loc index is a required input of read.

  • write(loc, value): Updates store at index loc with value. Both loc and value are required inputs for write.

Example 2: A mutex#

This state machine ensures that only a single client can access a resource at a time.

A resource allocator

The state machine described in the code widget uses two helper functions:

  • head(list_): Returns the first element of a list list_.

  • tail(list_): Returns list_ without it's first element.

Let's describe the components of the mutex state machine. It has two state variables:

  • user: Holds the current user's ID with access to the resource.

  • waiting: Lists users awaiting access to the resource. The list contains the IDs of users.

It has two commands:

  • acquire(): Appends the current client's ID to the waiting list if user is not empty. Otherwise, it makes the current client the user.

  • release(): Gives access to the client next in line (represented by the head of the waiting list head(waiting)) if the waiting list is not empty. If the list is empty, it sets user to None (null).

Example 3: Process control#

Process control involves monitoring some values for some purpose. One purpose is to ensure that systems operate within safe limitations, for example, monitoring pipeline pressure in natural gas distribution. Sensors can monitor the value. In this example, we use a sensor to periodically check a value and send it to our state machine.

A process control state machine

Let's describe the components of the process_control state machine. It has one state variable:

  • q: A real number.

It has one command:

  • adjust: Performs a function Foo on q and sensor_val and updates the value of q. It then sends the new value of q to the actuator.

Semantic characterization#

The process monitor is not a part of the state machine pc. By keeping monitor out of process_control, we ensure that process_control adheres to the semantic characterization property. Here is why. If we had kept a command in process_control that loops, reads from the sensor, writes to the actuator and then waits, then the output of process_control would not be entirely determined by a sequence of requests made to process_control because it would also depend on the time interval of delay. The process_control state machine would produce different outputs for different delay intervals. Therefore, by moving monitor out of the state machine, we remove the impact of the time delay d on the output of the state machine process_control.

What’s next?#

Now that we understand state machines and can model our services as a state machine, we’ll see how to replicate state machines for fault tolerance in the next lesson.

Introduction to State Machine Replication

Replication and Coordination of State Machines